10 research outputs found
Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes
This report surveys advances in deep learning-based modeling techniques that
address four different 3D indoor scene analysis tasks, as well as synthesis of
3D indoor scenes. We describe different kinds of representations for indoor
scenes, various indoor scene datasets available for research in the
aforementioned areas, and discuss notable works employing machine learning
models for such scene modeling tasks based on these representations.
Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With
respect to analysis, we focus on four basic scene understanding tasks -- 3D
object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene
similarity. And for synthesis, we mainly discuss neural scene synthesis works,
though also highlighting model-driven methods that allow for human-centric,
progressive scene synthesis. We identify the challenges involved in modeling
scenes for these tasks and the kind of machinery that needs to be developed to
adapt to the data representation, and the task setting in general. For each of
these tasks, we provide a comprehensive summary of the state-of-the-art works
across different axes such as the choice of data representation, backbone,
evaluation metric, input, output, etc., providing an organized review of the
literature. Towards the end, we discuss some interesting research directions
that have the potential to make a direct impact on the way users interact and
engage with these virtual scene models, making them an integral part of the
metaverse.Comment: Published in Computer Graphics Forum, Aug 202
Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images
We introduce the first active learning (AL) framework for high-accuracy
instance segmentation of moveable parts from RGB images of real indoor scenes.
As with most human-in-the-loop approaches, the key criterion for success in AL
is to minimize human effort while still attaining high performance. To this
end, we employ a transformer that utilizes a masked-attention mechanism to
supervise the active segmentation. To enhance the network tailored to moveable
parts, we introduce a coarse-to-fine AL approach which first uses an
object-aware masked attention and then a pose-aware one, leveraging the
hierarchical nature of the problem and a correlation between moveable parts and
object poses and interaction directions. Our method achieves close to fully
accurate (96% and higher) segmentation results, with semantic labels, on real
images, with 82% time saving over manual effort, where the training data
consists of only 11.45% annotated real photographs. At last, we contribute a
dataset of 2,550 real photographs with annotated moveable parts, demonstrating
its superior quality and diversity over the current best alternatives
RoSI: Recovering 3D Shape Interiors from Few Articulation Images
The dominant majority of 3D models that appear in gaming, VR/AR, and those we
use to train geometric deep learning algorithms are incomplete, since they are
modeled as surface meshes and missing their interior structures. We present a
learning framework to recover the shape interiors (RoSI) of existing 3D models
with only their exteriors from multi-view and multi-articulation images. Given
a set of RGB images that capture a target 3D object in different articulated
poses, possibly from only few views, our method infers the interior planes that
are observable in the input images. Our neural architecture is trained in a
category-agnostic manner and it consists of a motion-aware multi-view analysis
phase including pose, depth, and motion estimations, followed by interior plane
detection in images and 3D space, and finally multi-view plane fusion. In
addition, our method also predicts part articulations and is able to realize
and even extrapolate the captured motions on the target 3D object. We evaluate
our method by quantitative and qualitative comparisons to baselines and
alternative solutions, as well as testing on untrained object categories and
real image inputs to assess its generalization capabilities
Tone mapping HDR images using local texture and brightness measures
by Akshay Gadi Patil and Shanmuganathan Rama
Automatic content-aware non-photorealistic rendering of images
Non-photorealistic rendering techniques work on image features and often manipulate a set of characteristics such as edges and texture to achieve a desired depiction of the scene. Most computational photography methods decompose an image using edge preserving filters and work on the resulting base and detail layers independently to achieve desired visual effects. We propose a new approach for content-aware non-photorealistic rendering of images where we manipulate the visually salient and non-salient regions separately. We propose a novel content-aware framework in order to render an image for applications such as detail exaggeration, artificial smoothing, and image abstraction. The processed regions of the image are blended seamlessly with the rest of the image for all these applications. We demonstrate that content awareness of the proposed method leads to automatic generation of non-photorealistic rendering of the same image for the different applications mentioned above.by Akshay Gadi Patil and Shanmuganathan Rama